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1.  Introduction 


Sky  cover  records  are  available  for  many  stations  by  month  and  by  time  of  day. 
To  obtain  the  climatic  probability  of  a sky  cover  condition  at  a specific  location 
and  for  a given  day  and  hour  of  the  day,  it  is  presently  possible  to  retrieve  these 
records  and  to  obtain  an  empirical  estimate.  This  is  a rather  slow,  cumbersome  and 
costly  process.  It  is  the  purpose  of  this  report  to  document  some  efforts  to 
compact  some  of  this  data  by  the  use  of  analytical  models  and  a limited  number  of 
parameters  so  as  to  make  possible  rapid  recall  and  reuse. 

In  this  report,  we  develop  models  for  seven  weather  stations.  The  data  used 
to  develop  the  models  was  extracted  from  the  "Revised  Uniform  Summary  of  Weather 
Observations"  (RUSSWO's)  prepared  by  the  Data  Processing  Branch  of  the  Air  Weather 
Service.  For  each  station  96  separate  models  were  first  developed,  one  for  each 
three  hour  period  of  the  day  for  each  of  the  twelve  months.  These  were  then  con- 
densed into  single  models. 
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2 . Development  of  the  96  Individual  Models 


The  basic  model  used  for  cloud  cover  has  been  the  Johnson  Sg  family  of  dis- 
tributions. Using  these,  the  empirical  distribution  is  transformed  to  a standard 
normal  variate  (equivalent  normal  deviate) . An  advantage  of  such  a transformation 
is  that  estimates  of  the  percentiles  of  the  fitted  distribution  can  be  obtained 
using  a table  of  areas  under  a standard  normal  distribution. 

The  S family  is  given  by 

O 

z - y + n In  Cx/ (l-x) ] 

where  y and  n are  constants  determined  by  the  data.  The  value  of  x is  the  proportion 
of  the  sky  that  is  covered  (end  point  of  the  interval).  The  variable  Z is  of  course 
the  standardized  normal  variable  (equivalent  normal  deviate) . The  probability  of 
sky  cover  less  than  or  equal  to  xq  is  given  the  area  under  the  standardized  normal 
curve  below  the  value  zq.  That  is 

P(X  5 x0)  = P(Z  S zQ) . 

The  data  in  the  RUSSWO's  for  a given  station  is  by  month  and  by  three  hour 
period  of  the  day.  Thus  96  different  models  were  first  developed,  one  for  each  time 
of  day  for  each  month  of  the  year.  There  are  11  categories  of  observed  sky  cover 
designated  0.,.1,.2,...,1.0.  The  interior  boundaries  between  the  eleven  categories 
of  sky  cover  were  taken  to  be  .05 , . 15, . . . , .95. 

For  a given  month  and  time  of  day,  the  values  of  y and  n were  obtained  by  simple 
linear  regression.  The  values  of  Z corresponding  to  the  tabulated  proportion  of  sky 
cover  less  than  an  interior  boundary  value  for  x is  regressed  against  that  x.  The 
96  sets  of  values  of  y and  n for  each  station,  obtained  in  this  manner  are  tabulated 
in  Section  _5.  Also  given  is  the  RMS  (root  mean  square  error)  and  a table  giving  the 
frequencies  of  different  magnitudes  of  errors.  The  RMS  is  defined  as 

RMS  = /sum  of  squared  deviations/number  of  observations  . 

By  deviation,  we  mean  the  difference  between  the  observed  cumulative  frequency 
and  the  cumulative  frequency  obtained  from  our  model. 

It  should  be  noted  that  we  chose  to  use  the  Johnson  distribution  because  of  its 
ease  in  use  and  the  many  shapes  which  it  can  assume.  Another  model  considered  as  a 
possible  model  for  sky  cover  was  the  beta  distribution.  The  incomplete  beta  function 
is  a two-parameter  model  with  the  following  distribution  function: 

P(X  S x0)  =/ q°  t3"1  (l-t)b_1  ctt/B(a,b). 

To  test  the  applicability  of  the  beta  model,  the  method  of  moments  was  used  to 
calculate  the  96  sets  of  beta  parameters  for  the  Patrick  Air  Force  Base  data.  The 
curves  of  cloud  cover  for  Patrick  were  usually  U-shaped  with  the  estimates  for  a 
ranging  from  .214  to  1.281,  and  the  estimates  for  b ranging  from  .299  to  .863.  The 
only  readily  available  computer  routine  designed  to  evaluate  the  incomplete  beta 
distribution  with  parameters  of  this  size  was  the  ISML  subroutine  MDBETA,  available 
in  Fortran  IV.  The  programming  was  involved  since  our  initial  beta  work  was  in  SAS. 
The  96  sets  of  two  parameters  were  used  to  compute  the  960  expected  values  of  the 
cumulative  distribution  function  (one  for  each  value  of  x corresponding  to  an  interior 
boundary).  For  the  Johnson  curves,  the  RMS  for  the  Johnson  curves  was  .0249.  For 
the  beta  curves  the  RMS  was  .0223. 
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Thus,  the  beta  curves  gave  a better  fit  than  the  Johnson  curves.  The  rel- 
atively small  increase  in  "goodness  of  fit"  is  discounted  by  the  more  difficult 
programming  involved,  and,  more  important,  the  fact  that  no  good  approximations  to 
the  beta  distribution  function  exist  for  small  values  of  the  parameters.  For  the 
Johnson  curves,  standard  normal  tables  are  readily  available  and  easy  to  use. 

Many  pocket  calculators  can  directly  evaluate  the  normal  distribution  function. 
Except  for  the  data  for  Patrick  Air  Force  base,  the  beta  curves  were  abandoned  in 
favor  of  the  Johnson  model. 

3.  Development  of  the  Ovirall  Models 

In  the  previous  section  we  outlined  the  development  of  individual  models  for 
each  month  for  each  time  of  day.  In  this  section  we  develop  a single  overall  model 
for  each  station  which  is  valid  for  all  times  of  day  and  all  days  of  the  year. 

Each  of  the  96  pairs  of  estimates  of  y and  n are  valid  for  a specific  time  of 
day  and  month  of  a year.  We  shall  attempt  to  regress  each  of  y and  n on  functions 
of  time  of  day  and  month  (day)  of  the  year.  If  we  can  do  this,  then,  given  day 
of  the  year  and  hour  of  the  day,  we  can  calculate  estimates  for  y and  p , which  can 
be  used  to  calculate 

Z = y + p £n  [x/(l-x)3  . 

For  such  regression  purposes,  we  let  H be  the  hour  of  the  day  and  let  it 
correspond  to  the  midpoint  of  the  three  hour  period.  In  a similar  fashion,  we 
chose  to  make  D the  day  of  the  year.  Labelling  the  days  of  the  year  from  1 to  365, 
we  assign  values  to  D corresponding  to  the  months  of  the  year  as  follows: 

Months  Over  Which 

Data  has  been  Amassed  D 


Jan. 

15 

Feb. 

45 

Mar. 

74 

Apr. 

105 

May 

135 

Jun. 

166 

Jul. 

196 

Aug. 

227 

Sep. 

258 

Oct. 

288 

Nov. 

319 

Dec. 

349 

As  a first  step  in  the  selection  of  terms  for  the  comprehensive  model,  values 
of  y for  each  three  hour  period  were  plotted  against  month  (D)  of  the  year.  A 
separate  linear  regression  was  run  of  y against  D for  each  three  hour  period  and 
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the  residuals  examined  both  individually  for  each  three  hour  period,  and  collect- 
ively. The  residuals  indicated  a cyclic  effect  and  overlaying  of  the  plots  of 
the  residuals  facilitated  estimation  of  an  appropriate  phase  shift  (gamma  day 
phase  shift)  for  a sine  term  to  supplement  the  linear  regression. 

In  an  analogous  manner  estimates  for  gamma  hour  phase  shift,  eta  day  phase 
shift  and  eta  hour  phase  shift  were  obtained  for  additional  model  terms. 


In  addition  to  using  the  above  terms,  two  additional  types  of  terms  were  used. 
First,  sin  terms  with  periods  of  one  half  and  one  quarter  of  the  model  terms  des- 
cribed above  were  used.  Second,  certain  "cross  product"  terms  were  added  to  the 
model. 


The  postulated  model  before  terms  were  removed  by  stepwise  regression  in- 
cluded the  following  terras  (non-cross  product).  (GD,  GH,  ED,  EH  refer  to  gamma- 
day,  gamma-hour,  eta-day  and  eta-hcur  phase  shifts  respectively.) 


GD365 

GD182 

GD91 

GH24 

GH12 

GH6 

ED365 

ED182 

ED91 

EH24 

EH12 

EH6 


sin 

2it(D-GD) 

365 

sin 

4ir(E|-GD) 

3<j5 

sin 

8tt  (D-GD) 

365 

sin 

2ir(H-GH) 

24 

sin 

4ir(H-GH) 

24 

sin 

8tt(H-GH) 

24 

sin 

2tt(D-ED) 

365 

sin 

4n(D-ED) 

365 

sin 

8ir(D-ED) 

365 

sin 

2it(H-EH) 

24 

sin 

4ir(H-EH) 

24 

sin 

8it(H-EH) 

24 

The  cross  product  terms  crossed  terms  beginning  with  G with  each  other,  and  terms 
beginning  with  E with  each  other.  (Of  course,  all  the  eta  terms-terms  beginning 
with  E were  multiplied  by  tfl  (x/(l-x)  in  the  stepwise  regression  model.) 


A stepwise  procedure  was  used  to  determine  which  sine  terms  should  be  included 
in  the  overall  formula.  For  the  largest  models  developed,  the  significance  level 
for  entry  of  a sine  term  was  0.50  and  for  staying  in  the  model  was  0.05.  The  large 
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model  was  trimmed  by  omitting  terms  which  contributed  least,  and  only  the  reduced 
models  appear  in  Section  5. 

4.  Use  of  the  Models 


Suppose  one  wishes  to  find  the  probability  of  less  than  85%  sky  cover  at 
Bedford  on  November  10,  1982  at  6 A.M,  Using  Model  I,  one  would  proceed  as  follows. 

For  Bedford  in  November  at  6 A.M. , y = -.339,  n = .217. 

We  have  Z = y + n tn  [x/(l-x)] 

= -.339  + .217  In  (.85/ . 15) 

= .037 

Prob  [Z  £ .037]  = .51. 

Using  Model  I,  we  have  the  probability  of  less  than  85%  sky  cover  on  November  10, 
at  6 A.M.  as  0.51. 

Using  Model  II,  D = 314,  H » 6 
We  have  GD  = 81,  GH  =■  5.6,  ED  - 110,  EH  » 6.4. 

Y = -.336  + .00074(314)  - .003(6)  - .054  sin  - .069  sin 

„„„  , 2ir(6-5.6)  . 2ir(314-81)  . 8*(314-81) 

-.203  sin  - .062  sin  - x sin  — • -0-v~c 

24  365  365 

. , 4x(314-81)  . 8ir  (314-81) 

+ .078  sin  — x sin  — - 

= -.336  + .232  - .018  - .004  - .010  - .000  + .003 
= -.133. 

n = .196  - .00005(314)  + .002(6)  + .067  sin  -1--  + .066  sin 

= .196  - .016  + .012  +.004  + .000 


= .196. 


We  have  Z - -.133  + .196  In  (.85/. 15) 

- .207 

Prob  (Z  s .207)  - .58 
Using  Model  II  gives  a probability  of  .58, 

5.  Tables  for  the  Models 

For  each  station,  Model  I gives  the  values  of  y and  n in  the  formula 
Z « y + n In  [x/(l-x)l 

Model  II  gives  the  expressions  for  y and  n to  be  used  in  the  above  formula. 
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The  following  notation  is  used 


GD365 

is 

sin 

GD182 

is 

sin 

GD91 

is 

sin 

GH24 

is 

sin 

GH6 

is 

sin 

ED365 

is 

sin 

ED182 

is 

sin 

ED91 

is 

sin 

2tt(D-GD) 

365 

4tt(D-GD) 

365 

8n(D-GD) 

365 

2 it (H-GH) 
24 

8ir(H-GH) 

24 

2ir  (D-ED) 
365 

4tt(D-ED) 

365 

8 it  (D-ED) 
365 


D,  H are  input  day  and  hour  respectively.  (Note  0 < D S 365  and  0 < H £ 24.) 

GD,  GH,  ED,  EH  are  "phase  shifts"  whose  values  for  a particular  station  are 
tabulated. 

It  should  be  noted  that  under  "Error  Information"  the  stated  probabilities, 
strictly  speaking  are  not  probabilities,  but  relative  frequency  with  which  the 
"residual"  for  the  fit  exceeds  the  stated  error. 
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Station  14601  Bangor,  Maine  (CONT.) 

Model  II  (Comprehensive  Johnson) 

Y = -.405  + . 00036D  - .001H  - .063  GD365  - .102  GD182 
- .224  GH24  + .091  GD182  * GD91 

n - .217  - . 00006D  + . 001H  + .074  ED365  + .074  EH24 
GD  = 83,  GH  = 5.7,  ED  = 113,  EH 

Error  Information 

X Prob(Abs.  Error  £ X) 

.01  .803 

.02  .615 

.03  .442 

.04  .284 

.05  .180 

.06  .114 

.07  .058 

.08  .021 

.09  .014 


6.8 


RMS  Error 


.037 


Station  14702  Bedford,  Mass.  (CONT.) 
Model  II  (Comprehensive  Johnson) 

Y 


-.336  + . 00074D  - .003H  - .054  GD365 

-.069  GD182  - .203  GH24  - .062  GD365  * GD91  + .078  GD  182  * GD91 


.196  - .00005D+  . 002H  + .067  ED365  + .066  EH24 

81,  GH  = 5.6,  ED  = 110,  EH  = 6.4  . 


GD 


Error  Information 


X 

Prob(Abs.  Ei 

.01 

.780 

.02 

.578 

.03 

.396 

.04 

.273 

.05 

.166 

.06 

.090 

.07 

.053 

.08 

.029 

.09 

.016 

RMS  Error  = .036 


Station  26435  Nenana,  Alaska  (CONT.) 
Model  II  (Comprehensive  Johnson) 


Y - -.320  - .00035D  - .005H  -.319  GD365  - .132  GH24 


n ■ 

.191  + . 002H  + .091  ED365  + .026  EH24 

GD  = 120,  GH  - 

5.3,  ED  = 100,  EH  * 5.7 

Error  Information 

X 

Prob(Abs.  Error  > 

JO 

.01 

.809 

.02 

.623 

.03 

.473 

.04 

.323 

.05 

.212 

.06 

.127 

.07 

.069 

.08 

.030 

.09 

.018 

RMS  Error  * .040 
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Station  12867,  Patrick  AFB  (CONT.) 

Model  II  (Comprehensive  Johnson) 

y * -.018  - .138  GD365  + .091  GD182  - .332  GH24 
-.141  GD365  * GH24 

n *=  .344  + .106  ED365  + .073  EH24 

GD  «=  105,  GH  = 8.0,  ED  - 118,  EH  - 6.0  . 

RMS  Error  = .054 
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Station  41008  Saigon,  Vietnam  (CONT.) 
Model  II  (Comprehensive  Johnson) 


Y = -.532  - . 0025D  -.008H  - .878  GD365  - .327  GH24 
n - .914  + . 003H  + .185  ED365 


GD  = 108,  GH  = 8,  ED  - 125,  EH  - 0 . 

Error  Inf ormation 

Prob  (Abs.  Error  £ X) 

.694 
.590 
.527 
.466 
.415 
.368 
.327 
.297 
.259 

RMS  Error  - .092 


' 
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Station  45715  Shemya,  Alaska  (CONT.) 

Model  II  (Comprehensive  Johnson) 

Y = -1.672  + . 0017D  - .002H  - .327  GD365  - .174  GD182 

-.115  GH24  + .138  GD365  * GH24  + .124  GD182  * GD91 

n = .344  + .003H  - .131  ED365  + .056  EH24  - .028  ED365  * EK24 
+ .030  ED182  * ED91 


GD  - 108,  GH  - 6,  ED  - 108,  EH  - 6 . 


Error  Information 


X 

Prob(Abs.  Error 

01 

.608 

02 

.405 

03 

.292 

04 

.208 

05 

.149 

06 

.100 

07 

.077 

08 

.064 

09 

.058 

RMS  Error  - .042 
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Station  33123  Tripoli,  Libya  (CONT.) 

Model  II  (Comprehensive  Johnson) 

Y = .973  - . 002 ID  - .003H  + .761  GD365  + .156  GD182 
-.183  GH24  + .337  GD365  * GD182 

n = .286  + . 00016D  + . 001H 


GD  = 147,  GH  * 5,  ED  = 192,  EH  - 5 . 


Error  Information 


X Prob(Abs.  Error  £ X) 


01 

.842 

02 

.702 

03 

.556 

04 

.438 

05 

.323 

06 

.244 

07 

.185 

08 

.149 

09 

.112 

RMS  Error  = .054 
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